Research on Deep Web Query Interface Clustering Based on Hadoop
نویسندگان
چکیده
How to cluster different query interfaces effectively is one of the most core issues when generating integrated query interface on Deep Web integration domain. However, with the rapid development of Internet technology, the number of Deep Web query interface shows an explosive growth trend. For this reason, the traditional stand-alone Deep Web query interface clustering approaches encounter bottlenecks in terms of time complexity and space complexity. After further study of the Hadoop distributed platforms and Map Reduce programming model, a Deep Web query interface clustering algorithm based on Hadoop platform is designed and implemented, in which the Vector Space Model (VSM) and Latent Semantic Analysis (LSA) are employed to represent “Query Interfaces-Attributes” relationships. The experimental results show that the proposed algorithm has better scalability and speedup ratio by using Hadoop architecture.
منابع مشابه
Deep Web Integrated Query Interface Construction Method Based on Apriori Algorithm ⋆
Deep Web contains numerous data resources and it has been a hot topic in the database research field. There are many researches focused on Deep Web query interface discovery, form information extraction, etc. However, a very limited amount of studies are about Deep Web integrated query interface construction till now. This paper provides an integrated interface construction method based on Apri...
متن کاملTraitor: Associating Concepts using the World Wide Web
We use Common Crawl’s 25TB data set of web pages to construct a database of associated concepts using Hadoop. The database can be queried through a web application with two query interfaces. A textual interface allows searching for similarities and differences between multiple concepts using a query language similar to set notation, and a graphical interface allows users to visualize similarity...
متن کاملComparison of Clustering Methods over a Hidden Web Data using Stratification
This paper’s centre of attention is on the problem of data mining (in general) and clustering (in specific) on a hidden web data. We know that data mining is a process that analyzes and extracts knowledge from large amounts of data which provides useful information to users. Hidden or deep web data is the database located at remote system .So, to access such data, we need query interface or HTM...
متن کاملModeling and Extracting Deep-Web Query Interfaces
Interface modeling & extraction is a fundamental step in building a uniform query interface to a multitude of databases on the Web. Existing solutions are limited in that they assume interfaces are flat and thus ignore the inherent structure of interfaces, which then seriously hampers the effectiveness of interface integration. To address this limitation, in this chapter, we model an interface ...
متن کاملIdentification and Classification of Deep Web Query Interfaces via Ontology
In order to obtain the large quantities of valuable information on Deep Web, it is required to discover the related individual query interface and design the integrated query interface on which user query request can be submitted. The key challenges are to identify and classify the Deep Web query interface accurately. In view of the regular data of Deep Web, we consider to construct the Deep We...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JSW
دوره 9 شماره
صفحات -
تاریخ انتشار 2014